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Abstract 

Bayesian inference is developed for matrix-variate dynamic linear models (MV-DLMs), 
in order to allow missing observation analysis, of any sub-vector or sub-matrix of the 
observation time series matrix. We propose modifications of the inverted Wishart and 
matrix t distributions, replacing the scalar degrees of freedom by a diagonal matrix of 
degrees of freedom. The MV-DLM is then re-dcfincd and modifications of the updating 
algorithm for missing observations are suggested. 

Some key words: Bayesian forecasting, dynamic models, inverted Wishart distribution, 
state space models. 

1 Introduction 

Suppose that, in the notation of West and Harrison (1997, Chapter 16), the p x r matrix- 
variate time series {yt} follows a matrix-variate dynamic linear model (MV-DLM) so that 

y', = Fl@t + e't and Qt = GtOt^i + cot, (1) 

where Ft is a dxr design matrix, Gt is a dx d evolution matrix and is a d x p state matrix. 
Conditional on a p x p covariance matrix S, the innovations and uJt follow, respectively, 
matrix-variate normal distributions (Dawid, 1981), i.e. 

et\^ ^ Nr>cpiO,Vt,^) and u;* |S ~ A^rfxp(0, S). 

This is equivalent to writing vec(et)|5] ~ Nrp{0, Vj) and vec(u;t)|5] ~ Ncip{0, S^VF^), where 
vec(.) denotes the column stacking operator of a matrix, ^ denotes the Kronecker product of 
two matrices and Nrp{., ■) denotes the multivariate normal distribution. 
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We assume that the innovation series {e^} and {ivt} are internally and mutually uncorre- 
lated and also they are uncorrelated with the assumed initial priors 

eolS ~ iVdxp("io,-Po, 5]) and S ~ /Wp(no, noSo), (2) 

for some known ttt-q, Pq, uq and Sq. Here S ~ IWp{nQ,nQSo) denotes the inverted Wishart 
distribution with no degrees of freedom and parameter matrix uqSo- The covariance matrices 
Vt and Wt are assumed known; usually Vt = Ir (the r x r identity matrix) and Wt can be 
specified using discount factors as in West and Harrison (1997, Chapter 6). Alternatively, 
Wt = W may be considered time-invariant and it can be estimated from the data using the 
EM algorithm (Dempster et al, 1977; Shumway and Stoffer, 1982). With the above initial 
priors ^ the posterior distribution of 0j |S, yi, . . . , is a matrix- variate normal distribution 
and the posterior distribution of . . . , is an inverted Wishart distribution with degrees 
of freedom rit = nt-i + 1 and a parameter matrix ritSt, which are calculated recurrently (West 
and Harrison, 1997, Chapter 16). 

Missing data in time series are typically handled by evaluating the likelihood function 
(Jones, 1980; Ljung, 1982; Shumway and Stoffer, 1982; Harvey and Pierse, 1984; Wincek and 
Reinsel, 1984; Kohn and Ansley, 1986; Ljung, 1993; Gomez and Maravall, 1994; Luceho, 1994; 
Luceho, 1997). In the context of model ([T|) a major obstacle in inference is when a sub- vector 
or sub-matrix yt of yt is missing at time t. Then the scalar degrees of freedom of the inverted 
Wishart distribution of . . . , y^, are incapable to include the information of the observed 
part of yt, but to exclude the influence of the missing part yt- For example consider p = 2 
and r = 1 or yt = [yu y2t]' and suppose that at time t, yu is missing {yt = yu), while y2t 
is observed. Let nt-i denote the degrees of freedom of the inverted Wishart distribution of 
. . . ,yt-i- One question is how one should update nt, since the information at time t is 
partial (one component observed and one missing). Likewise, given this partial information 
at time t, another question is how to estimate the off-diagonal elements of S, which leads to 
the estimation of the covariance of yu and y2t ■ 

In this paper, introducing several degrees of freedom that form a diagonal matrix, we 
propose modifications to the inverted Wishart and matrix t distributions. We prove the 
conjugacy between these distributions and we discuss modifications in the recursions of the 
posterior moments in the presence of missing data. This approach does not require to order all 
missing observations in one matrix (Shumway and Stoffer, 1982; Luceho, 1997) and therefore 
it can be applied for sequential purposes as new data are observed. 
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2 Matrix-variate dynamic linear models 

2.1 Modified inverted Wishart distribution 

Suppose that T, is a pxp random covariance matrix, S, R are pxp covariance matrices and N 
is a p X p diagonal matrix with positive diagonal elements. Let tr(.), etr(.) and |.| denote the 
trace, the exponent of the trace and the determinant of a square matrix, respectively. The 
density of the inverted Wishart distribution is given by 

= c|i?|(^^-?'-i)/2|s|-'=/2etr , (3) 

from which it is deduced that 

I |S|-^/2gtr(^-^i2S-i^ dE = c-i|i?|-('=-f-i)/2, (4) 

with O = {S G : S > 0}, = 2(''~P~^^P/^Tp{{k -p- l)/2}, and k > 2p, where Tp{.) 

is the multivariate gamma function. 

Lemma 1. The function 

where c does not depend on T,, is a density function. 
Proof. If the following bijective transformation is applied 

= ivV25Ari/2 and k = 2v+^-^, (6) 

p 

then ([5]) is directly obtained from ([3|). □ 

From the above bijection and the Wishart integral, we can see that the normalizing con- 
stant c is 

{2v+tr{N)/p-p-l}/2 

c = co|5|{2-+fW/f-f-i}/2 I J]: 

where 

^ 2(2v+tT{N)/p~p~i}p/2j. f 2v + tr{N)/p-p-l 
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for N = diag(ni, . . . ,np) and > (i = 1, . . . ,p). 

Density ([5|) proposes a modification of the inverted Wishart distribution in order to in- 
corporate a diagonal matrix of degrees of freedom. The modification consists of a bijective 
transform of the two distributions. We will then say that S follows the modified inverted 
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Wishart distribution and we will write T, ~ MIWp{S, N, v), where u is a scalar liyperparam- 
eter. Note that when ni = ■ ■ ■ = Up = n and v = p, the above distribution reduces to an 
inverted Wishart distribution with n degrees of freedom. 

With k and R as defined in equation ([6]), the mean of S is 

E{J:) = = / + 2v-2p-2\ ' N^/^SN^/\ 

k — 2p — 2 I p J 

for p~^tT{N) > 2p—2v+2. The next result gives the distribution of a MIW matrix conditional 
on a normal matrix. 

Proposition 1. Let Y be an r x p random matrix that follows a matrix normal distribution, 
conditional on S, and a p x p covariance random matrix that follows a modified inverted 
Wishart distribution, written Y\T, ~ Nry.p{m,P,Ti) and S ~ MIWp{S,N,v) respectively, for 
some known quantities m, P, S, N, and v. Then, the conditional distribution o/S given Y, 
is 

J:\Y ^ MIWp{S*,N*,v), 
where N*^/^S*N*^/^ = {Y - m)'p-^(y - m) + N^/^SN^^^ and N* = N + rip. 
Proof. Form the joint distribution of Y and S and write 

p(s|y) oc p(y,s)=p(y|s)p(s) 



^{{Y - fyQ-\Y - f) 



(7) 



which is sufficient for the proof with the definition of S* and N* . □ 

In the context of Proposition [1] the joint distribution of Y and S is referred to as joint nor- 
mal modified inverted Wishart distribution with notation y, S ~ N MIWry.p^p{m, P, S, N, v), 
for m, P, S, N, and v as defined in Proposition [TJ The next result gives the marginal 
distribution of Y. First we give some background material on the matrix t distribution. 

Let X be an r X p random matrix. Then, the matrix t distribution is defined by 

p{X) = c\Q + {X- M)'P~^{X - M)|-('=+^-+P-i)/2, (8) 

with 

_ Tp{{k + r + p- l)/2}|Q|(^+P-i)/2|P|-?'/^ 
7:-P/^Tp{{k+p-l)/2] ' 

where M is an r x p matrix, P a r x r covariance matrix, Q a p x p covariance matrix, and 

k any positive real number. 
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Proposition 2. Let Y be an r x p random matrix that follows a matrix normal distribution 
conditional on S, and T, be a px p covariance random matrix that follows a modified inverted 
Wishart distribution, written Y\y, ~ Nrxpif^Q,^), o,nd S MIWp{S, N,v) respectively, for 
known quantities f, Q, S, N, and v. Then, the marginal distribution ofY is 



p(Y) = C\N^/^SN^/^ + (y - fyQ~\Y - f)\~{^-+tr{N)/p+d-p-l}/2^ 



(9) 



which by analogy of the M IW distribution, is a modification of the matrix t distribution and 
it is written as AIT{f,Q,S,N,v). 

Proof. The joint distribution of Y and S is given by equation d?]). Hence, the marginal 
distribution of Y is 



p{Y) 



p{Y, S) dS, 



where n = {T. e : E > 0}. Set R = {Y - f)'Q~^{Y - f) + N^/'^SN^/'^ and k = 

2v + r + tr {N)/p and from equation (jj]) we have equation ([9]). □ 

The distribution of Proposition ([2]) can be derived from the matrix t distribution (see 
equation ([8])). The normalizing constant c of ([9]) is obtainable from ([8]) as 



Tp{{k + r+p-l)/2} 




(fc+p-l)/2 



\Q\ 



where = diag(ni, . . . , Up) and k = 2v — 2p + tr{N)/p. Note that if all the diagonal elements 
of are the same (i.e. ui = ■ ■ ■ = Up = n) and v = p, then the above distribution reduces to 
a matrix t distribution with n degrees of freedom. 

Finally we give the marginal distribution of S. Consider the following partition of S, S, 
and N 





Sl2 


, s = 


Su 


Sl2 


, N = 


' Ni " 


^12 


5^22 




'-'12 


S22 




_ 0' N2 _ 



where Su, and A^n have dimension q x q, for some 1 < q < p- The next result gives the 
marginal distribution of Su. 

Proposition 3. If T, ^ MIWp{S,N,v), under the above partition of Tj the distribution of 
Sii is Sii ~ MIWq{Sii,Nii,vi), where vi = v - p + q + 2'^ p~^ tr{N) - 2'^^ q~^ tr{Ni) . 

Proof. The proof suggests the adoption of transformation ([6|) together with the partition of 
i? in (131) as 

Ru R12 

K = 

R12 R22 
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Using marginalization properties of the inverted Wishart distribution, upon noticing 
Ari/25^1/2 ^ 



we get Ell ~ MIWq{Sii, Ni,vi), with vi as required. □ 

A similar result can be obtained for T,22- Consequently, if we write S = {cij} (1 < i,j < p) 
and N = diag(ni, . . . ,np), then the diagonal variances an follow modified inverted Wishart 
distributions, an ~ MIWi{sii,ni, Vi), where Vi = v — p+l + 2~^p~^t'i:{N) — 2~^?ij. These in 
fact are inverted gamma distributions an ~ IG{vi + ni/2 — 1, njSjj/2). Note that if ni = • • • = 
Up = n and v = p, then we have that an ~ IG{n/2, nsn/2) (the inverted gamma distribution 
used in West and Harrison (1997) when p = 1). 

We close this section with a brief discussion on an earlier study proposing the incorpora- 
tion of several degrees of freedom for inverted Wishart matrices (Brown et al, 1994). This 
approach is based on breaking the degrees of freedom on blocks and requiring for each block 
the marginal density of the covariance matrix to follow an inverted Wishart distribution. 
However, in that framework the conjugacy between the normal and that distribution is lost 
and as a result the proposed estimation procedure may be slow and probably not suitable for 
time series application. Relevant inferential issues of that approach are discussed in Garth- 
waite and Al-Awadhi (2001). Our proposal of the MIW distribution retains the desired 
conjugacy and it leads to relevant modifications of the matrix t distribution, which provides 
the forecast distribution. Furthermore, the MIW density leads to fast computationally effi- 
cient algorithms, which are suitable for sequential model monitoring and expert intervention 
(Salvador and Gargallo, 2004). Finally, according to Proposition [3l the marginal distributions 
of MIW matrices are also MIW, which means that several degrees of freedom are included 
in the marginal models too, something that is not the case in the approach of Brown et al. 
(1994). 

2.2 Matrix- variate dynamic linear models revisited 

We consider model ([T]), but now we replace the initial priors ^ by the priors 

eo|S~iVrfxp(mo,i^o,So) and Sq ~ M/VFp(So, A^cp), (10) 

for some known niQ, Pq, Sq and Nq. Practically we have replaced the inverted Wishart prior 
by the MIW and so, for each t = 1, . . . , T, we use p degrees of freedom nu, . . . , Upt in order 
to estimate where denotes the information set, comprising of observed data yi, . . . ,yt. 
The next result provides the posterior and forecast distributions of the new MV-DLM. 
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Proposition 4. One-step forecast and posterior distributions in the model ^^ with the initial 
priors U0\) . are given, for each t, as follows. 

(a) Posterior at t - 1 : et_i,S|y*-i ^ NMIWdxpA^t-i, Pt-i, St-i, Nt-i,p), 
for some mt^i, Pt^i, St^i and Nf^i. 

(b) Prior at t : &t,^y''^ - NMIWdxpA^'t, Rt, St-i, Nt^i,p), 

where at = Gtrrit^i and Rt = GtPt-iG[ + Wt- 

(c) One- step forecast at t: y[\Ti,y^~^ Nrxpift^Qt,^), 

with marginal: y't\y^~^ ~ MTrxpifl,Qt, St-i, Nt-i,p), 
where f[ = Flat and Qt = FlRtFt + Vt. 

(d) Posterior at t : et,S|y* ~ NMIWdxp,p{'^uPt.St,Nt,p), 

with 

mt = at + Ate't, Pt = Rt - AtQtA't, 
Nt = iVt-i + rip, Nl^^StNl'^ = NlL\St^iNlL\ + ctQt^e't, 
At = RtFtQt^, and et = yt - ft- 

The proof of this resuh fohows immediately from Propositions [T] and [5J For t = 1, 
(a) coincides with the priors (jlOp . From Proposition [21 the marginal posterior of Qt\y'' is 
B(|y* ~ MTdxpiiTT-t, Pt, St, Nt,p). Thus the above proposition gives a recursive algorithm for 
the estimation and forecasting of the system for all t = 1, . . . , T. 

Proposition m gives a generalization of the updating recursions of matrix- variate dynamic 
models (West and Harrison, 1997, Chapter 16). The main difference of the two algorithms 
is that the scalar degrees of freedom ut of the standard recursions are replaced by Nt in 
the above proposition and that the inverted Wishart distribution is replaced by the modified 
inverted Wishart distribution (in order to account for the matrix of degrees of freedom). As 
a result the classical Bayesian updating of West and Harrison (1997) is obtained as a special 
case of the distributional results of Proposition U by setting Nt = ntip = diag(nt, . . . ,nt) 
{t = 0,1, ... ,T), where nt represent the scalar degrees of freedom of the inverted Wishart 
distribution of and uq is the initial degrees of freedom. 

3 Missing observations 

In this section we consider missing observations at random. Our approach is based on ex- 
cluding any missing values of the calculation of the updating equations (state and forecast 
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distributions) thus excluding the unknown influence of these unobserved variables. This ap- 
proach is explained for univariate dynamic models in West and Harrison (1997, Chapters 
4,10). 

The univariate dynamic linear model with unknown observational variance is obtained 
from model ([1]) for p = r = 1. In this case the posterior recursions of mt, Pt and St of West 
and Harrison (1997, Chapter 4) follow from Proposition [Has a special case. Now suppose that 
at time t the scalar observation yt is missing so that = It is then obvious that the 

posterior distribution of 0j equals its prior distribution (since no information comes in to the 
system at time t). Then we have mt = at, Pt = Rt, St = St~i and Nt = rit = nt^i = Nt^i- 
To incorporate this into the updating equations of the posterior means and variances, we can 
write mt = at- AtetUt, Pt = Rt - AtA[QtUt, ritSt = nt-iSt^i + ejut/Qt and nt = nt„i + ut, 
where ut is zero, if yt is missing and ut = 1, if yt is observed. So when p = \ the inclusion of 
ut in the posterior recursions leads to identical analysis as in West and Harrison (1997) and 
in references therein. The introduction of ut in the recursions automates the posterior /prior 
updating in the presence of missing values and it motivates the case for p, r > 1. 

Moving to the multivariate case, first we consider model ([T]) as defined in the previous 
section with r = 1. Assume that we observe all the p x 1 vectors i = 1, . . . , t — 1. At time t 
some observations are missing (sub- vectors of yt, or the entire yt). To distinguish the former 
from the latter case we have the following definition. 

Definition 1. A partial missing observation vector is said to he any strictly sub-vector of the 
observation vector that is missing. If the entire observation vector is missing it is referred to 
as full missing observation vector. 

Considering the MV-DLM ([T]), it is clear that in the case of a full missing vector we have 

et,^y' NMIWdy,p,p{mt,Pt,St,Nt,p), (11) 

where mt = at, Pt = Rt, St = St-i, Nt = Nt-i, since no information comes in at time t. This 
equation relates to the standard posterior distribution of West and Harrison (1997) by setting 
Nt = diag(nj, . . . , nt), for a scalar nt > and evidently reducing the MIW distribution by a 
IW distribution. If one starts with a prior Nq = diag(no, . . . , uq), and assuming that at some 
time t, there is a full missing vector yt, then it is clear that the posterior (jlip equals to the 
posterior of @t, ^\y^ using the standard recursions (West and Harrison, 1997). Any differences 
between the two algorithms is highlighted only by observing partial missing vectors and this 
has been the motivation of the new algorithm. 

Define a p x p diagonal matrix Ut = diag(«if, . . . , ipt) with 




1 if yjt is observed, 
if yjt is missing, 



8 



for all I <j <p, where yt = [yu ■■■ Upt]' ■ 

Then, the posterior distribution (jlip still applies with recurrences 

mt = at + Ate[Ut (12) 

Pt = Rt- AtA[QtUt (13) 

Nt = Nt^i + Ut (14) 

Nl^^StNl^^ = NlHSt^iNlH + UtetQT'e'tUt, (15) 

where ut = tr{Ut)/p- Some explanation for the above formulae are in order. 

First note that if no missing observation occurs Ut = Ip, ut = 1 and we have the standard 
recurrences as in Proposition SI On the other extreme (full missing vector), Ut = 0, ut = 
and we have equation (jlip . Consider now the case of partial missing observations. Equation 
(|14|) is the natural extension of the single degrees of freedom updating, see West and Harrison 
(1997, Chapter 16). For equation (|12p note that the zero's of the main diagonal of Ut convey 
the idea that the corresponding to the missing values elements of mt remain unchanged and 
equal to at- For example, consider the case of p = 2, d = 2 and assume that you observe yu, 
but y2t is missing. Then 

Ait{yit-fit) 
A2t{yit - fit) 

where At = [An A2t]'- The zero's on the right hand side reveal that the second column of mt 
is the same as the second column of at. Similar comments apply for equations (jl3p and (jlSp . 

Considering the case of r > 2, we define Ukt to be the diagonal matrix Ukt = diag(iifc^t, . . . , ipk,t) 
with 

1 if yjk,t is observed, 
if yjk,t is missing, 
where yt = {yjk,t}, {j = I, ■ ■ ■ ,p;k = I, . . . ,r). 

Then, the moments of equation (jlip can be updated via 

r r 

mt = at + Ate't JJ Ukt, Pt = Rt - AtQtA'tUt, Nt = Nt^i + Ukt 

k=l k=l 



mt = at + 



1jk,t 



Ni^stNi" = Ninst^xn + n ^'^O n ^^^^ 



,1/2^^,1/2 _ ^,1/2^ ^,1/2^ 

\A:=1 / \fc=l / 

where ut = ^'<^(Ji\k=i Ukt)/p- Similar comments as in the case of r = 1 apply. Definition [1] is 
trivially extended in the case when observations form a matrix (r > 2). 

We illustrate the proposed methodology by considering simulated data, consisting of 100 
bivariate time series yi, . . . , yioo, generated from a local level model yt = [yu y2t]' = ipt+^t and 
ipt = V'i-i +Ci) where ^t and Q are all simulated from bivariate normal distributions. The 
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Figure 1: Simulated bivariate time series (solid line) with one-step forecasts from (a) the 
standard DLM recursions (dotted/dashed line) and (b) the new DLM recursions (dashed 
line). The gaps indicate missing values. 

correlation of eu and e2t is set to 0.8, while the elements of Q are uncorrelated. This model 
is a special case of model ([1]) with Q[Ft = tpt and Gt = h- Figured] (solid line) shows the 
simulated data; the gaps in this figure indicate missing values at times t = 24, 43, 60, 75, 86. 
At times t = 24, 43, 86, yt2 is only missing (partial missing vectors), at time t = 75, yti is only 
missing (partial missing vector) and at time t = 60, both yti,yt2 are missing (full missing 
vector). For this data set, we compare the performance of recursions (|12|) -(|15|) with that of 
the classic or old recursions of West and Harrison (1997), which assume that when there is at 
least one missing value we set Ut = and ut = 0. For example using the old recursions, for 
t = 24 one would set U24 = and U24 = 0, losing the "partial" information of 1/24,1 = —3.739, 
which is observed. On the other hand, the new recursions would suggest for t = 24 to set 



24 



1 




and ii24 = 1/2. 



Figure [T] shows the one-step forecast mean of {yt} using the new recursions (dashed line) 
and the old recursions (dotted/dashed line). We observe that the new method produces a 
clear improvement in the forecasts as the old recursions provide poor forecasts, especially in 
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the low panel of Figure [T] (for {yi^}). What is really happening in this case is that, under 
the old recursions, the missing values of y2t affect the recursions for yu, since the observed 
information at yu is wrongly "masked" or "ignored" for the points of time when y2t is missing. 
On the other hand, the new recursions use the explicit information from each sub-vector of yt 
and thus the new recursions result in a notably more accurate forecast performance. This is 
backed by the mean square standardized forecast error vector, which for the new recursions 
is [1.300 1.825]', while for the old recursions is [1.545 2.182]'. Under the old recursions we can 
not obtain an estimate of the covariance between an observed yu and a missing y2t- However, 
this is indeed obtained under the proposed new recursions and so the respective correlations 
at points of time where there are gaps are 0.633 (at t = 24), 0.779 (at t = 43), 0.812 (at 
t = 75) and 0.809 (at t = 86); the mean of these correlations is 0.792, which is close to the 
real 0.8 under the simulation experiment. 
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